Skip to content

Conversation

@malte-j
Copy link

@malte-j malte-j commented Jun 26, 2025

mmap actually slows down data loading if all layers are offloaded to the GPU and VRAM < RAM. I tested this using an A100 and Gemma3, and these are the results:

RAM mmap load time
40GB  no 14.3s 
40GB yes 33.677s
8GB no 14.490s
8GB yes 1.07m

@malte-j malte-j marked this pull request as ready for review June 26, 2025 20:23
@malte-j malte-j requested a review from ngxson as a code owner June 26, 2025 20:23
Copy link
Collaborator

@ngxson ngxson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this table is auto-generated. changing here will be discarded once we regenerate it in the future

@malte-j malte-j closed this Jun 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants